Nature Methods
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Nature Methods's content profile, based on 336 papers previously published here. The average preprint has a 0.37% match score for this journal, so anything above that is already an above-average fit.
Feng, Y.; Deng, K.; Guan, Y.
Show abstract
Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.
Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.
Show abstract
Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.
Mille-Fragoso, L. S.; Driscoll, C. L.; Wang, J. N.; Dai, H.; Widatalla, T. M.; Zhang, J. L.; Zhang, X.; Rao, B.; Feng, L.; Hie, B. L.; Gao, X. J.
Show abstract
Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Meanwhile, computational methods for antibody design have been limited by low success rates that currently require resource-intensive screening. Here, we introduce Germinal, a broadly enabling generative pipeline that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-n experimental testing. Our method co-optimizes antibody structure and sequence by integrating a structure predictor with an antibody-specific protein language model to perform de novo design of functional complementarity-determining regions (CDRs) onto a user-specified structural framework. When tested against four diverse protein targets, Germinal successfully designed functional antibodies across all targets and binder formats, testing only 43-101 designs for each antigen. Validated designs also exhibited robust expression in mammalian cells and high sequence and structural novelty. We provide open-source code and full computational and experimental protocols to facilitate wide adoption. Germinal represents a milestone in efficient, epitope-targeted de novo antibody design, with notable implications for the development of molecular tools and therapeutics.
Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.
Show abstract
As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.
Sauer, C. M.; Tovey, N.; Ptasinska, A.; Hughes, D.; Stockton, J.; Zumalave, S.; Rust, A. G.; Lynn, C.; Livellara, V.; Sevrin, F.; Himsworth, C.; Muyas, F.; Nicolaidou, M.; Parry, G.; Paisana, E.; Cascao, R.; Ahmed, S. W.; Yasin, S. A.; Portela, L. R.; Balasubramanian, P.; Burke, G. A. A.; Vedi, A.; Faria, C. C.; Marshall, L. V.; Jacques, T. S.; Hubank, M.; Hargrave, D.; George, S.; Angelini, P.; Anderson, J.; Chesler, L.; Beggs, A. D.; Cortes-Ciriano, I.
Show abstract
Cell-free DNA (cfDNA) profiling enables minimally invasive cancer detection and monitoring. We present SIMMA, a low-input single-molecule sequencing approach that enables multimodal whole-genome and high-depth targeted sequencing of the same cfDNA sample for both tumour-agnostic and tumour-informed liquid biopsy analysis. Across 792 plasma and cerebrospinal fluid cfDNA samples from 277 paediatric patients with diverse brain and extracranial tumours, SIMMA enabled tumour diagnosis, detection of driver mutations, and reconstruction of extrachromosomal DNA (ecDNA) months before clinical relapse. Using conformal prediction trained on genome-wide fragmentomics, genomic and epigenomic data, SIMMA predicts disease burden as a continuous variable and provides well-calibrated uncertainty estimates for each sample, achieving a limit of detection of [~]100 ppm from low-pass whole-genome sequencing data. In summary, SIMMA establishes the clinical utility of multimodal cfDNA profiling with uncertainty quantification for individual patients and unlocks the potential of ecDNA as a liquid biopsy biomarker for disease detection and monitoring across diverse aggressive malignancies.
Ullman, T.; Krantz, D.; Avenel, C.; Lung, M.; Svedman, F. C.; Holmsten, K.; Ostling, P.; Ullen, A.; Stadler, C.
Show abstract
Effective predictive biomarkers for immune checkpoint inhibitor (ICI) therapy remain an unmet need across solid tumors. Here, we present an integrated spatial proteomics workflow that combines in situ proximity ligation assay with multiplexed immunofluorescence to directly resolve PD1/PDL1 signaling events at the level of defined cellular phenotypes and their spatial organization within intact tumor tissue. Applied as a proof of concept to tumor samples from patients with metastatic urothelial carcinoma treated with pembrolizumab, this approach reveals that PD1/PDL1 interactions specifically involving cytotoxic CD8CD3 T cells are significantly enriched in complete responders, while such interactions are rare in patients with progressive disease. This interaction defined T cell subset achieves superior discrimination of clinical response compared to single marker PDL1 expression or immune cell abundance alone. By integrating direct detection of protein protein interactions with high dimensional single cell phenotyping, our workflow provides a mechanistically informed, spatially resolved biomarker of functional immune engagement. Beyond urothelial carcinoma, this platform establishes a generalizable framework for translating spatial signaling biology into predictive tools for immunotherapy response across tumor types.
Strobl, E. V.
Show abstract
Motivation: Complex disorders arise from multiple genetic mechanisms, but most drug-prioritization methods treat each disorder as a single phenotype and therefore miss locus-specific therapeutic opportunities. Results: We present SIEVE, a framework that decomposes complex disorders into genetically localized subphenotypes and links GWAS summary statistics, reference expression, and perturbational transcriptional profiles to prioritize compounds that target locus-anchored disease mechanisms. SIEVE also constructs genetically calibrated mechanism vectors, projects away nonspecific expression programs using negative anchors, and aggregates evidence across cell lines, doses, and time points to produce robust drug rankings. Across simulations and analyses of real data, SIEVE improves compound prioritization relative to existing methods and shows that subphenotype-aware, genetics-guided modeling can sharpen therapeutic discovery in heterogeneous disorders. Availability and Implementation: R implementation: github.com/ericstrobl/SIEVE.
Chandra, S.
Show abstract
Background. Pancreatic ductal adenocarcinoma (PDAC) has a five-year survival rate of approximately 12%, largely because it is typically diagnosed at an advanced stage. CT-based computational methods for early detection exist but rely on black-box deep learning or large texture feature sets without tissue-specific interpretability. Methods. We developed Virtual Spectral Decomposition (VSD), which applies six parameterized sigmoid functions S(HU) = 1/(1+exp(-alpha x (HU - mu))) to standard portal-venous CT, decomposing each pixel into tissue-specific response channels for fat (mu=-60), fluid (mu=10), parenchyma (mu=45), stroma (mu=75), vascular (mu=130), and calcification (mu=250). Dendritic Binary Gating identifies structural content per channel using morphological filtering, enabling co-firing analysis and lone firer identification. A 25-feature signature was extracted per patient. Three independent datasets were analyzed: NIH Pancreas-CT (n=78 healthy), Medical Segmentation Decathlon Task07 (n=281 PDAC, paired tumor/adjacent tissue), and CPTAC-PDA from The Cancer Imaging Archive (n=82, multi-institutional, with DICOM time point tags). The same six sigmoid parameters were used across all datasets without retraining. Results. VSD achieved AUC 0.943 for field effect detection (healthy vs cancer-adjacent parenchyma) and AUC 0.931 for patient-stratified tumor specification on MSD. On CPTAC-PDA, VSD achieved AUC 0.961 (6 features) and 0.979 (25 features) for distinguishing healthy from cancer-bearing pancreas on scans obtained prior to pathological diagnosis. All significant features replicated across datasets in the same direction: z_fat (d=-2.10, p=3.5e-27), z_fluid (d=-2.76, p=2.4e-38), fire_fat (d=+2.18, p=1.2e-28). Critically, VSD severity did not correlate with days-from-diagnosis (r=-0.008, p=0.944) across a range of day -1394 to day +249. Patient C3N-01375, scanned 3.8 years before pathological diagnosis, had VSD severity 1.87, well above the healthy mean of 0.94 +/- 0.33. The tissue transformation signature was temporally stable, indicating an early, persistent tissue state rather than a progressively worsening process. Conclusions. VSD with Dendritic Binary Gating detects a stable pancreatic tissue composition signature on standard CT that is present years before clinical diagnosis, validated across three independent datasets without parameter adjustment. The six sigmoid channels map to biologically meaningful tissue components through a fully transparent interpretability chain. The temporal stability of the signal implies a detection window of 3-7 years, consistent with known PanIN-3 microenvironment transformation timelines. VSD functions as a single-scan screening tool applicable to any abdominal CT performed during the pre-clinical window.
Mboya, G. O.
Show abstract
Machine learning models trained on observational data from one environment frequently fail when deployed in another, because standard learning algorithms exploit spurious correlations alongside causal ones. Invariant learning methods address this problem by seeking representations that support stable prediction across training environments, but their behavior on tabular data remains poorly characterized. We present CausTab, a gradient variance regularization framework for causal invariant representation learning on mixed tabular data. CausTab penalizes the variance of parameter gradients across training environments, providing a richer invariance signal than the scalar penalty used by Invariant Risk Minimization (IRM). We provide formal results showing that the gradient variance penalty is zero at causally invariant solutions and positive at solutions that rely on spurious features. Through experiments on synthetic data across three spurious-correlation regimes, four cycles of the National Health and Nutrition Examination Survey (NHANES), and four hospital systems in the UCI Heart Disease dataset, we demonstrate that: (1) IRM consistently degrades relative to standard empirical risk minimization (ERM) on tabular data, losing up to 13.8 AUC points in spurious-dominant settings, a failure we trace mechanistically to penalty collapse during training; (2) CausTab matches or exceeds ERM in every experimental condition; (3) CausTab achieves consistently better probability calibration than both ERM and IRM; and (4) invariant learning methods fail when environments differ in outcome prevalence rather than in spurious feature correlations, a boundary condition we characterize both empirically and theoretically. We introduce the Spurious Dominance Index (SDI), a practical scalar diagnostic for determining whether a dataset requires invariant learning, and validate it across all experimental settings
Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.
Show abstract
Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open response sub analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.
Uchida, Y.; Fujii, Y.; Swahn, H.; Ueda, M. T.; Chiba, T.; Matsushima, T.; Naito, Y.; Nakamichi, R.; Takahashi, K.; Olmer, M.; The RE-JOIN Consortium Investigators, ; Lotz, M.; Kochi, Y.; Asahara, H.
Show abstract
Osteoarthritis (OA) is a prevalent musculoskeletal disorder and a leading cause of global disability. Although meniscal damage is a major risk factor of OA pathogenesis, genetic regulatory studies have remained largely confined to articular cartilage. Here, we establish the first comprehensive expression quantitative trait locus (eQTL) map integrating whole-genome sequencing and bulk transcriptomics from human meniscus (n=112) and cartilage (n=113). Supported by single-nucleus multiomics (cartilage: 56,549 nuclei; meniscus: 34,343 nuclei), we uncovered highly tissue-specific genetic risk architectures. Colocalization with OA GWAS identified 27 meniscus-specific, 28 shared, and 20 cartilage-specific causal genes. Chromatin-informed fine-mapping and deconvolution elucidated distinct pathogenic mechanisms; notably, meniscus-specific signals converged on VEGFA via rare promoter variants and an enhancer in fibrochondrocyte progenitors, alongside a shared eQTL for CLEC18A. Exploratory analysis suggested candidate compounds to reverse pathogenic gene expression. Our findings underscore the meniscus as a distinct genetic driver, molecularly reinforcing OA as an entire joint organ failure.
Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.
Show abstract
Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.
Hakata, Y.; Oikawa, M.; Fujisawa, S.
Show abstract
Background. Federated learning (FL) enables collaborative model training across institutions without sharing patient-level data. However, standard FL algorithms such as FedAvg degrade under non-independently and non-identically distributed (non-IID) data, a prevalent condition when patient demographics, scanner hardware, and disease prevalence differ across hospital sites. Objective. We propose iPS-MFFL (Individualized Per-Site Meta-Federated Feature Learning), a federated framework with a hierarchical local-model architecture that addresses non-IID heterogeneity through (1) a shared feature extractor, (2) multiple weak-learner classification heads that can be trained with heterogeneous training objectives to promote complementary decision boundaries, (3) independent per-learner server aggregation so that each weak learner's parameters are averaged only with its counterparts at other clients, and (4) a lightweight meta-model, itself federated, that adaptively stacks the weak-learner outputs. Methods. We evaluate on the Brain Tumor MRI Classification dataset (7,200 images; 4 classes: glioma, meningioma, pituitary tumor, no tumor) partitioned across K = 5 simulated hospital sites using Dirichlet non-IID sampling (alpha = 0.3). Four baselines are compared: Local-only training, FedAvg, FedProx, and Freeze-FT. All experiments are repeated over three random seeds (13, 42, 2025) and evaluated using paired t-tests, Cohen's d effect sizes, and post-hoc power analysis.
Steffen, F. D.; Lissat, A.; Alten, J.; Kriston, A.; Scheidegger, N.; Eckert, C.; Bodmer, N.; Schori, L.; Schühle, S.; Arpagaus, A.; Gutnik, S.; Manioti, D.; Bruderer, N.; Zeckanovic, A.; Västrik, I.; Nyiri, G.; Kovacs, F.; Thorhauge Als-Nielsen, B. E.; Attarbaschi, A.; Rademacher, A.; Elitzur, S.; Jacoby, E.; De Moerloose, B.; Svenberg, P.; Ancliff, P.; Sramkova, L.; Buldini, B.; Balduzzi, A.; Boer, J. M.; Mielcarek, M.; Ceppi, F.; Ansari, M.; Halter, J.; Schmiegelow, K.; Locatelli, F.; DelBufalo, F.; Stanulla, M.; Kulozik, A. E.; Schrappe, M.; Rohrlich, P.; Cave, H.; Baruchel, A.; von Stack
Show abstract
Children with relapsed or refractory acute lymphoblastic leukemia (ALL) require more effective and less toxic therapies. We established a prospective, multicenter Drug Response Profiling (DRP) registry (NCT06550102) integrating functional testing into precision-guided treatment. DRP was performed for 340 patients from 17 European countries with a turn-around time of two-weeks. Image-based drug screening with over 135000 unique perturbations revealed a heterogeneous landscape of ex vivo responses to 88 drugs on average. Ranking drug responses across the patient cohort defined individual drug fingerprints, identifying "DRP twins" by similarity in sensitivity and resistance independent of genetic ALL subtypes. Of 239 high-risk patients with follow-up, DRP-informed interventions were reported for 63 patients (26%). Patients received combination therapies based on venetoclax, tyrosine kinase inhibitors, trametinib, bortezomib or selinexor, resulting in objective clinical responses in 43 cases (68%). Precision-guided treatments allowed bridging to cellular therapies in 42 patients among whom 28 (67%) were still alive with a median follow-up of 21 months after DRP (IQR: 14.7-26.6 months). Top responders to venetoclax, ranked within the first tertile of the cohort, had superior 1-year event-survival compared to venetoclax non-responders (0.57 [95% CI, 0.39-0.85] vs. 0.25 [95% CI, 0.11-0.58]). Collectively, these findings demonstrate the feasibility and clinical relevance of functional profiling within an international network. This scalable framework enables individualized therapy selection for enrolment in adaptive precision trials for high-risk pediatric ALL.
Hayford, C. E.; Baleami, B.; Stauffer, P. E.; Paudel, B. B.; Al'Khafaji, A.; Brock, A.; Quaranta, V.; Tyson, D. R.; Harris, L. A.
Show abstract
Drug-tolerant persisters (DTPs) represent a major obstacle to durable responses in targeted cancer therapy. DTPs are commonly described as distinct single-cell states that survive drug treatment via reversible, non-genetic mechanisms and drive tumor recurrence. Recent work demonstrates that multiple DTPs can coexist, reflecting diversity in lineage, signaling programs, or stress responses. However, each DTP is still generally viewed as a uniform cellular phenotype. Building on our prior work describing a population-level DTP termed "idling" [Paudel et al., Biophys. J. (2018) 114, 1499-1511], here we present evidence supporting a fundamentally different view: that DTPs are not single-cell states, but rather heterogeneous populations composed of multiple sub-states with distinct division and death rates that balance to produce near-zero net population growth. Using single-cell transcriptomics and lineage barcoding, we identify multiple phenotypic states within idling DTP populations, with reduced heterogeneity compared to untreated populations, and find that idling DTP cells emerge from nearly all lineages. Transcriptomic and functional analyses further reveal altered ion-channel activity in idling DTPs, which we confirm experimentally. Moreover, drug-response assays reveal increased susceptibility of idling DTPs to ferroptosis, a non-apoptotic form of regulated cell death, indicating the emergence of vulnerabilities associated with drug tolerance. Altogether, our results support a population-level view of tumor drug tolerance in which DTPs comprise stable collections of phenotypic states, shaped by treatment-defined phenotypic landscapes, which are potentially vulnerable to subsequent interventions. This perspective implies that eradicating DTPs will require a fundamental shift away from cell-type-centric strategies toward sequential treatments that progressively reduce phenotypic heterogeneity by modulating the molecular and cellular processes that establish the DTP landscape, an approach previously termed "targeted landscaping."
Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.
Show abstract
Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multisystemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.
Oh, J.; Steele, A. G.; Scheffler, M.; Martin, C.; Sheynin, J.; Dietz, V. A.; Valdivia-Padilla, A.; Stampas, A.; Korupolu, R.; Karmonik, C.; Hodics, T. M.; Freyvert, Y.; Manzella, M.; Faraji, A. H.; Horner, P. J.; Sayenko, D. G.
Show abstract
Cervical spinal cord injury (SCI) causes profound and persistent loss of hand function, and effective neuromodulation strategies remain limited. We report the first-in-human implantation of a 32-contact cervical epidural paddle array in two individuals with severe chronic SCI. Individualized motor pool recruitment maps, derived from systematic bipolar and multipolar configurations, enabled person-specific stimulation parameters. Optimized stimulation restored volitional hand opening, closing and coordinated upper-limb movements that were previously unattainable. This approach achieved a >91% success rate in complex reach-grasp-lift-release sequences, supported by substantial gains in range of motion, grip, and pinch strength. Electrophysiological and kinematic analyses demonstrated parameter-dependent, selective recruitment of flexor and extensor motor pools. Personalized stimulation programs integrated with goal-directed activities enabled functional hand use in home and community settings, sustained over several months of continued autonomous use. These findings establish a mechanistically grounded and translational framework for restoring upper-limb function after chronic severe SCI.
Yang, Z.; Lyng, G. D.; Batra, S. S.; Tillman, R. E.
Show abstract
Medical concept extraction from electronic health records underpins many downstream applications, yet remains challenging because medically meaningful concepts, such as diagnoses, are frequently implied rather than explicitly stated in medical narratives. Existing benchmarks with human-annotated evidence spans underscore the importance of grounding extracted concepts in medical text. However, they predominantly focus on explicitly stated concepts and provide limited coverage of cases in which medically relevant concepts must be inferred. We present MedicalBench, a new benchmark for medical concept extraction with evidence grounding that evaluates implicit medical reasoning. MedicalBench formulates medical concept extraction as a verification task over medical note concept pairs, coupled with sentence level evidence identification. Built from MIMIC-IV discharge summaries and human verified ICD-10 codes, the dataset is curated through a multi stage large language model (LLM) triage pipeline followed by medical annotation and expert review. It deliberately includes implicit positives, semantically confusable negatives, and cases where LLM judgments disagree with medical expert assessments. Annotators provide sentence level evidence spans and concise medical rationales. The final dataset contains 823 high quality examples. We define two complementary evaluation tasks: (1) medical concept extraction and (2) sentence level evidence retrieval, enabling assessment of both correctness and interpretability. Benchmarking state-of-the-art LLMs and a supervised baseline reveals that performance remains modest, highlighting the difficulty of extracting implicitly expressed concepts. We further show that explicitly incorporating reasoning cues and prompting to extract implicit evidence substantially improves medical concept extractions, while performance is largely invariant to note length, indicating that MedicalBench isolates reasoning difficulty rather than superficial confounders. MedicalBench provides the first systematic benchmark for implicit, evidence-grounded medical concept extraction, offering a foundation for developing medical language models that can both identify medically relevant concepts and justify their predictions in a transparent and medically faithful manner.
Ben-Joseph, J.
Show abstract
Lightweight epidemic calculators are widely used for teaching and rapid scenario exploration, yet many omit the methodological detail needed for scientific reuse. We present a browser-native SIR calculator that exposes forward Euler and classical fourth-order Runge--Kutta (RK4) integration alongside epidemiologically interpretable outputs and a population-conservation diagnostic. The implementation is anchored to analytical properties of the deterministic SIR system, including the epidemic threshold, the peak condition, and the final-size relation. Benchmark experiments show that RK4 is essentially step-size invariant over practical discretizations, whereas Euler at a coarse one-day step overestimates peak prevalence by 3.97% and final size by 0.66% relative to a fine-step RK4 reference. These results demonstrate that browser-based tools can support publication-quality computational narratives when solver choice, diagnostics, and assumptions are treated as first-class outputs.
Bhansali, R.; Gorenshtein, A.; Westover, B.; Goldenholz, D. M.
Show abstract
Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 agent-suggested rewrite pairs using Phase 0 metrics confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved by 17% . Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process. Manuscript preparation is a critical bottleneck in scientific publishing, yet existing AI writing tools require cloud transmission of sensitive content, creating data-confidentiality barriers for clinical researchers. We introduce the Paper Analysis Tool (PAT), a free, multi-agent framework that deploys 31 specialized agents powered by small language models (SLMs) to audit manuscripts across multiple quality dimensions without external data transmission. Applied to three published clinical neurological papers, PAT generated 540 evaluable suggestions. Independent validation by two expert reviewers (R.B., A.G.) confirmed 391 actionable, high-value revisions (90% agreement), achieving a 72.4% overall usefulness accuracy spanning methodological, statistical, and visual domains. Furthermore, deterministic re-evaluation of 126 suggested Phase 0 rewrite pairs confirmed text improvement: total word count decreased by 25%, passive voice prevalence dropped sharply from 35% to 5%, average sentence length decreased by 24%, and long-sentence fraction fell by 67%, and the Flesch-Kincaid grade improved modestly. Our validation confirms that systematic, agent-driven pre-submission review drives measurable improvements, successfully converting manuscript optimization from an opaque, manual endeavor into a transparent and rigorous scientific process.